Speed up telemetry check by sharing a single TS program across roots !!#267651
Conversation
Previously each telemetry root (10 total) created its own ts.createProgram + getTypeChecker(), redundantly resolving the same shared Kibana transitive dependencies. This accounted for ~212s of sequential CPU time. This change collects all collector file paths across all roots upfront, creates a single shared TypeScript program, then partitions the parsed results back by root. Globs also now run in parallel via Promise.all. Local benchmark: 210s → 59s (~72% reduction). Co-authored-by: Cursor <cursoragent@cursor.com>
💛 Build succeeded, but was flaky
Failed CI StepsTest FailuresMetrics [docs]
History
|
|
Pinging @elastic/actionable-obs-team (Team:actionable-obs) |
ApprovabilityVerdict: Needs human review Performance optimization refactor for internal telemetry tooling. All changed files are owned by @elastic/kibana-core and the author is not a designated owner, so designated code owners should review. You can customize Macroscope's approvability policy. Learn more. |
afharo
left a comment
There was a problem hiding this comment.
Thank you for the speed boost. I added 2 thoughts of potential additional improvements.
Happy to approve if you prefer to address those on a follow-up PR.
| const restrictedProgramPaths = programPaths.filter((programPath) => | ||
| fullRestrictedPaths.includes(programPath) | ||
| return [ | ||
| { |
There was a problem hiding this comment.
This is awesome! I wonder if we can make it even faster by defining concurrent tasks:
- One task that gets the program
- A set of parallel tasks that run
extractCollectorsWithProgramwith the shared program created in 1.
There was a problem hiding this comment.
Thanks for the suggestion! I benchmarked this — extractCollectorsWithProgram across all 10 roots takes 0.06s total (most roots have 0-5 collectors, the largest two have 26 and 30). Since it's purely CPU-bound synchronous work (AST traversal + type checker lookups), Promise.all in single-threaded Node.js wouldn't actually parallelize it — it would just interleave the synchronous generators sequentially. True parallelism would need worker_threads, but the ts.Program can't be serialized across threads.
The bottleneck is createKibanaProgram at 53s (95.9% of the task). Extraction is negligible by comparison, so I'll leave this as-is.
| } | ||
|
|
||
| export function filterCollectorPaths(fullPaths: string[]): string[] { | ||
| return fullPaths.filter((p) => COLLECTOR_RE.test(readFileSync(p, 'utf-8'))); |
There was a problem hiding this comment.
I think that there's the potential of making this function async (and could cut more time).
There was a problem hiding this comment.
Good call — filterCollectorPaths currently uses readFileSync on all 36,308 globbed files and takes 1.79s. Making it async with fs.promises.readFile + Promise.all would overlap the I/O and could cut that roughly in half.
That said, it's ~3% of the total task time (53s is createKibanaProgram), so the absolute savings would be ~1s. Happy to do it in a follow-up for cleanliness!
|
Starting backport for target branches: 8.19, 9.3, 9.4 https://github.com/elastic/kibana/actions/runs/25446329284 |
💔 Some backports could not be created
Note: Successful backport PRs will be merged automatically after passing CI. Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
…roots !! (#267651) (#268009) # Backport This will backport the following commits from `main` to `9.4`: - [Speed up telemetry check by sharing a single TS program across roots !! (#267651)](#267651) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Shahzad","email":"shahzad31comp@gmail.com"},"sourceCommit":{"committedDate":"2026-05-06T15:58:35Z","message":"Speed up telemetry check by sharing a single TS program across roots !! (#267651)\n\n## Summary\n\nThe CI telemetry check (`node scripts/telemetry_check`) was creating a\nseparate TypeScript program (`ts.createProgram` + `getTypeChecker()`)\nfor each of the 10 telemetry roots. Since each program independently\nresolves the same shared Kibana transitive dependencies, this resulted\nin ~212s of redundant sequential CPU work.\n\nThis PR:\n- **Creates a single shared TS program** for all 69 collector files\nacross all roots, then partitions the parsed results back by root\n- **Parallelizes globbing** across all roots via `Promise.all`\n- **Extracts `filterCollectorPaths` and `extractCollectorsWithProgram`**\nas reusable functions\n\n### Benchmark results (local, 3 consistent runs)\n\n| Metric | Before | After | Improvement |\n|--------|--------|-------|-------------|\n| Total wall time | **210s** | **59s** | **-151s (72%)** |\n| TS type-check time | 212s (6 programs) | 50s (1 program) | -162s |\n| Glob phase | 0.7s (sequential) | 0.3s (parallel) | -0.4s |\n\n### Validation\n\n- `node scripts/telemetry_check` — passes (no changes mode)\n- `node scripts/telemetry_check --fix` — passes, correctly detects and\nfixes schema drift\n- `schema_checks.test.ts` — all 13 tests pass\n- `kbn-telemetry-tools` unit tests — all 8 suites / 45 tests pass\n- Tested with a real collector change (added field to CSP collector) to\nverify end-to-end detection, fix, and schema JSON update\n\n## Test plan\n\n- [ ] CI telemetry check passes on this PR\n- [ ] Verify telemetry check correctly detects schema drift when a\ncollector is modified\n- [ ] Verify `--fix` correctly updates the JSON schema files\n- [ ] Verify `--path` flag still works for scoped checks\n\nMade with [Cursor](https://cursor.com)\n\n---------\n\nCo-authored-by: Cursor <cursoragent@cursor.com>\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"010a02792a057c8088e0cac0f9680f3ac611be97","branchLabelMapping":{"^v9.5.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:all-open","Team:actionable-obs","author:actionable-obs","ci:use-selective-testing","v9.5.0"],"title":"Speed up telemetry check by sharing a single TS program across roots !!","number":267651,"url":"https://github.com/elastic/kibana/pull/267651","mergeCommit":{"message":"Speed up telemetry check by sharing a single TS program across roots !! (#267651)\n\n## Summary\n\nThe CI telemetry check (`node scripts/telemetry_check`) was creating a\nseparate TypeScript program (`ts.createProgram` + `getTypeChecker()`)\nfor each of the 10 telemetry roots. Since each program independently\nresolves the same shared Kibana transitive dependencies, this resulted\nin ~212s of redundant sequential CPU work.\n\nThis PR:\n- **Creates a single shared TS program** for all 69 collector files\nacross all roots, then partitions the parsed results back by root\n- **Parallelizes globbing** across all roots via `Promise.all`\n- **Extracts `filterCollectorPaths` and `extractCollectorsWithProgram`**\nas reusable functions\n\n### Benchmark results (local, 3 consistent runs)\n\n| Metric | Before | After | Improvement |\n|--------|--------|-------|-------------|\n| Total wall time | **210s** | **59s** | **-151s (72%)** |\n| TS type-check time | 212s (6 programs) | 50s (1 program) | -162s |\n| Glob phase | 0.7s (sequential) | 0.3s (parallel) | -0.4s |\n\n### Validation\n\n- `node scripts/telemetry_check` — passes (no changes mode)\n- `node scripts/telemetry_check --fix` — passes, correctly detects and\nfixes schema drift\n- `schema_checks.test.ts` — all 13 tests pass\n- `kbn-telemetry-tools` unit tests — all 8 suites / 45 tests pass\n- Tested with a real collector change (added field to CSP collector) to\nverify end-to-end detection, fix, and schema JSON update\n\n## Test plan\n\n- [ ] CI telemetry check passes on this PR\n- [ ] Verify telemetry check correctly detects schema drift when a\ncollector is modified\n- [ ] Verify `--fix` correctly updates the JSON schema files\n- [ ] Verify `--path` flag still works for scoped checks\n\nMade with [Cursor](https://cursor.com)\n\n---------\n\nCo-authored-by: Cursor <cursoragent@cursor.com>\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"010a02792a057c8088e0cac0f9680f3ac611be97"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.5.0","branchLabelMappingKey":"^v9.5.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/267651","number":267651,"mergeCommit":{"message":"Speed up telemetry check by sharing a single TS program across roots !! (#267651)\n\n## Summary\n\nThe CI telemetry check (`node scripts/telemetry_check`) was creating a\nseparate TypeScript program (`ts.createProgram` + `getTypeChecker()`)\nfor each of the 10 telemetry roots. Since each program independently\nresolves the same shared Kibana transitive dependencies, this resulted\nin ~212s of redundant sequential CPU work.\n\nThis PR:\n- **Creates a single shared TS program** for all 69 collector files\nacross all roots, then partitions the parsed results back by root\n- **Parallelizes globbing** across all roots via `Promise.all`\n- **Extracts `filterCollectorPaths` and `extractCollectorsWithProgram`**\nas reusable functions\n\n### Benchmark results (local, 3 consistent runs)\n\n| Metric | Before | After | Improvement |\n|--------|--------|-------|-------------|\n| Total wall time | **210s** | **59s** | **-151s (72%)** |\n| TS type-check time | 212s (6 programs) | 50s (1 program) | -162s |\n| Glob phase | 0.7s (sequential) | 0.3s (parallel) | -0.4s |\n\n### Validation\n\n- `node scripts/telemetry_check` — passes (no changes mode)\n- `node scripts/telemetry_check --fix` — passes, correctly detects and\nfixes schema drift\n- `schema_checks.test.ts` — all 13 tests pass\n- `kbn-telemetry-tools` unit tests — all 8 suites / 45 tests pass\n- Tested with a real collector change (added field to CSP collector) to\nverify end-to-end detection, fix, and schema JSON update\n\n## Test plan\n\n- [ ] CI telemetry check passes on this PR\n- [ ] Verify telemetry check correctly detects schema drift when a\ncollector is modified\n- [ ] Verify `--fix` correctly updates the JSON schema files\n- [ ] Verify `--path` flag still works for scoped checks\n\nMade with [Cursor](https://cursor.com)\n\n---------\n\nCo-authored-by: Cursor <cursoragent@cursor.com>\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"010a02792a057c8088e0cac0f9680f3ac611be97"}}]}] BACKPORT--> Co-authored-by: Shahzad <shahzad31comp@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
…roots !! (#267651) (#268008) # Backport This will backport the following commits from `main` to `9.3`: - [Speed up telemetry check by sharing a single TS program across roots !! (#267651)](#267651) <!--- Backport version: 9.6.6 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sorenlouv/backport) <!--BACKPORT [{"author":{"name":"Shahzad","email":"shahzad31comp@gmail.com"},"sourceCommit":{"committedDate":"2026-05-06T15:58:35Z","message":"Speed up telemetry check by sharing a single TS program across roots !! (#267651)\n\n## Summary\n\nThe CI telemetry check (`node scripts/telemetry_check`) was creating a\nseparate TypeScript program (`ts.createProgram` + `getTypeChecker()`)\nfor each of the 10 telemetry roots. Since each program independently\nresolves the same shared Kibana transitive dependencies, this resulted\nin ~212s of redundant sequential CPU work.\n\nThis PR:\n- **Creates a single shared TS program** for all 69 collector files\nacross all roots, then partitions the parsed results back by root\n- **Parallelizes globbing** across all roots via `Promise.all`\n- **Extracts `filterCollectorPaths` and `extractCollectorsWithProgram`**\nas reusable functions\n\n### Benchmark results (local, 3 consistent runs)\n\n| Metric | Before | After | Improvement |\n|--------|--------|-------|-------------|\n| Total wall time | **210s** | **59s** | **-151s (72%)** |\n| TS type-check time | 212s (6 programs) | 50s (1 program) | -162s |\n| Glob phase | 0.7s (sequential) | 0.3s (parallel) | -0.4s |\n\n### Validation\n\n- `node scripts/telemetry_check` — passes (no changes mode)\n- `node scripts/telemetry_check --fix` — passes, correctly detects and\nfixes schema drift\n- `schema_checks.test.ts` — all 13 tests pass\n- `kbn-telemetry-tools` unit tests — all 8 suites / 45 tests pass\n- Tested with a real collector change (added field to CSP collector) to\nverify end-to-end detection, fix, and schema JSON update\n\n## Test plan\n\n- [ ] CI telemetry check passes on this PR\n- [ ] Verify telemetry check correctly detects schema drift when a\ncollector is modified\n- [ ] Verify `--fix` correctly updates the JSON schema files\n- [ ] Verify `--path` flag still works for scoped checks\n\nMade with [Cursor](https://cursor.com)\n\n---------\n\nCo-authored-by: Cursor <cursoragent@cursor.com>\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"010a02792a057c8088e0cac0f9680f3ac611be97","branchLabelMapping":{"^v9.5.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","backport:all-open","Team:actionable-obs","author:actionable-obs","ci:use-selective-testing","v9.5.0"],"title":"Speed up telemetry check by sharing a single TS program across roots !!","number":267651,"url":"https://github.com/elastic/kibana/pull/267651","mergeCommit":{"message":"Speed up telemetry check by sharing a single TS program across roots !! (#267651)\n\n## Summary\n\nThe CI telemetry check (`node scripts/telemetry_check`) was creating a\nseparate TypeScript program (`ts.createProgram` + `getTypeChecker()`)\nfor each of the 10 telemetry roots. Since each program independently\nresolves the same shared Kibana transitive dependencies, this resulted\nin ~212s of redundant sequential CPU work.\n\nThis PR:\n- **Creates a single shared TS program** for all 69 collector files\nacross all roots, then partitions the parsed results back by root\n- **Parallelizes globbing** across all roots via `Promise.all`\n- **Extracts `filterCollectorPaths` and `extractCollectorsWithProgram`**\nas reusable functions\n\n### Benchmark results (local, 3 consistent runs)\n\n| Metric | Before | After | Improvement |\n|--------|--------|-------|-------------|\n| Total wall time | **210s** | **59s** | **-151s (72%)** |\n| TS type-check time | 212s (6 programs) | 50s (1 program) | -162s |\n| Glob phase | 0.7s (sequential) | 0.3s (parallel) | -0.4s |\n\n### Validation\n\n- `node scripts/telemetry_check` — passes (no changes mode)\n- `node scripts/telemetry_check --fix` — passes, correctly detects and\nfixes schema drift\n- `schema_checks.test.ts` — all 13 tests pass\n- `kbn-telemetry-tools` unit tests — all 8 suites / 45 tests pass\n- Tested with a real collector change (added field to CSP collector) to\nverify end-to-end detection, fix, and schema JSON update\n\n## Test plan\n\n- [ ] CI telemetry check passes on this PR\n- [ ] Verify telemetry check correctly detects schema drift when a\ncollector is modified\n- [ ] Verify `--fix` correctly updates the JSON schema files\n- [ ] Verify `--path` flag still works for scoped checks\n\nMade with [Cursor](https://cursor.com)\n\n---------\n\nCo-authored-by: Cursor <cursoragent@cursor.com>\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"010a02792a057c8088e0cac0f9680f3ac611be97"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v9.5.0","branchLabelMappingKey":"^v9.5.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/267651","number":267651,"mergeCommit":{"message":"Speed up telemetry check by sharing a single TS program across roots !! (#267651)\n\n## Summary\n\nThe CI telemetry check (`node scripts/telemetry_check`) was creating a\nseparate TypeScript program (`ts.createProgram` + `getTypeChecker()`)\nfor each of the 10 telemetry roots. Since each program independently\nresolves the same shared Kibana transitive dependencies, this resulted\nin ~212s of redundant sequential CPU work.\n\nThis PR:\n- **Creates a single shared TS program** for all 69 collector files\nacross all roots, then partitions the parsed results back by root\n- **Parallelizes globbing** across all roots via `Promise.all`\n- **Extracts `filterCollectorPaths` and `extractCollectorsWithProgram`**\nas reusable functions\n\n### Benchmark results (local, 3 consistent runs)\n\n| Metric | Before | After | Improvement |\n|--------|--------|-------|-------------|\n| Total wall time | **210s** | **59s** | **-151s (72%)** |\n| TS type-check time | 212s (6 programs) | 50s (1 program) | -162s |\n| Glob phase | 0.7s (sequential) | 0.3s (parallel) | -0.4s |\n\n### Validation\n\n- `node scripts/telemetry_check` — passes (no changes mode)\n- `node scripts/telemetry_check --fix` — passes, correctly detects and\nfixes schema drift\n- `schema_checks.test.ts` — all 13 tests pass\n- `kbn-telemetry-tools` unit tests — all 8 suites / 45 tests pass\n- Tested with a real collector change (added field to CSP collector) to\nverify end-to-end detection, fix, and schema JSON update\n\n## Test plan\n\n- [ ] CI telemetry check passes on this PR\n- [ ] Verify telemetry check correctly detects schema drift when a\ncollector is modified\n- [ ] Verify `--fix` correctly updates the JSON schema files\n- [ ] Verify `--path` flag still works for scoped checks\n\nMade with [Cursor](https://cursor.com)\n\n---------\n\nCo-authored-by: Cursor <cursoragent@cursor.com>\nCo-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>","sha":"010a02792a057c8088e0cac0f9680f3ac611be97"}}]}] BACKPORT--> Co-authored-by: Shahzad <shahzad31comp@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
…!! (elastic#267651) ## Summary The CI telemetry check (`node scripts/telemetry_check`) was creating a separate TypeScript program (`ts.createProgram` + `getTypeChecker()`) for each of the 10 telemetry roots. Since each program independently resolves the same shared Kibana transitive dependencies, this resulted in ~212s of redundant sequential CPU work. This PR: - **Creates a single shared TS program** for all 69 collector files across all roots, then partitions the parsed results back by root - **Parallelizes globbing** across all roots via `Promise.all` - **Extracts `filterCollectorPaths` and `extractCollectorsWithProgram`** as reusable functions ### Benchmark results (local, 3 consistent runs) | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Total wall time | **210s** | **59s** | **-151s (72%)** | | TS type-check time | 212s (6 programs) | 50s (1 program) | -162s | | Glob phase | 0.7s (sequential) | 0.3s (parallel) | -0.4s | ### Validation - `node scripts/telemetry_check` — passes (no changes mode) - `node scripts/telemetry_check --fix` — passes, correctly detects and fixes schema drift - `schema_checks.test.ts` — all 13 tests pass - `kbn-telemetry-tools` unit tests — all 8 suites / 45 tests pass - Tested with a real collector change (added field to CSP collector) to verify end-to-end detection, fix, and schema JSON update ## Test plan - [ ] CI telemetry check passes on this PR - [ ] Verify telemetry check correctly detects schema drift when a collector is modified - [ ] Verify `--fix` correctly updates the JSON schema files - [ ] Verify `--path` flag still works for scoped checks Made with [Cursor](https://cursor.com) --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Summary
The CI telemetry check (
node scripts/telemetry_check) was creating a separate TypeScript program (ts.createProgram+getTypeChecker()) for each of the 10 telemetry roots. Since each program independently resolves the same shared Kibana transitive dependencies, this resulted in ~212s of redundant sequential CPU work.This PR:
Promise.allfilterCollectorPathsandextractCollectorsWithProgramas reusable functionsBenchmark results (local, 3 consistent runs)
Validation
node scripts/telemetry_check— passes (no changes mode)node scripts/telemetry_check --fix— passes, correctly detects and fixes schema driftschema_checks.test.ts— all 13 tests passkbn-telemetry-toolsunit tests — all 8 suites / 45 tests passTest plan
--fixcorrectly updates the JSON schema files--pathflag still works for scoped checksMade with Cursor